Skip to content

Try to cache file hashes#133

Open
krlmlr wants to merge 3 commits intomasterfrom
hash_files
Open

Try to cache file hashes#133
krlmlr wants to merge 3 commits intomasterfrom
hash_files

Conversation

@krlmlr
Copy link
Copy Markdown
Collaborator

@krlmlr krlmlr commented Oct 20, 2016

Within a session, this should avoid recomputing file hashes where the file size and mtime have not changed. This exploits remake's internal caching.

Fixes #110.

Within a session, this should avoid recomputing file hashes where
the file size and mtime have not changed.  This exploits remake's
internal caching, which is not tested in this commit (and this
entirely lacks any sort of integration test).

For #110
Copy link
Copy Markdown
Collaborator Author

@krlmlr krlmlr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, thanks.

expect_equal(st$db$get(filename)$hash, h)

## Does not call back to the underlying hash function:
with_mock("hash_files" = lava,
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need to set "remake::hash_files" so that tests work in R CMD check. (Don't know why.)

expect_false(st$exists(file))
})

test_that("caching file store", {
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about splitting the test so that each test_that() call tests only one aspect of the behavior?

Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, that would be better style. It can get a bit tedious though as there's so much side effect heavy bits here (remake is basically all side effect)

## Then, we could change the file *size* and force rehashing.
writeBin(bytes[-1], filename)
h2 <- digest::digest(bytes[-1], serialize = FALSE)
with_mock("hash_files" = lava,
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as above.

self$db$set(filename, info)
info$hash
} else {
stop(sprintf("file %s not found in file store", filename))
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Handling abnormal conditions first saves one indent, but that's a matter of style.

))

file_info <- function(filename) {
info <- file.info(filename, extra_cols = FALSE)
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this function support len(filename) != 1? How about renaming to file_mtime_and_size() or similar?

}
}
info$hash <- hash_files(filename, named = FALSE)
self$db$set(filename, info)
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would inserting the whole object at once provide better encapsulation?

self$db$set(filename, list(info = info))

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants